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Abstract 

Background. Two standard sets of criteria are used to evaluate the tumor response of hepatocellular carcinoma (HCC): 
RECIST (Response Evaluation Criteria in Solid Tumors) and modified RECIST (mRECIST). The purpose was to compare 
two tumor response evaluation criteria, RECIST version 1.1 and mRECIST, for HCC treated using transcatheter arterial 
chemoembolization (TACE) . 

Methods. The radiological findings of patients who underwent TACE for HCCs in a multicenter clinical trial were examined. 
Sixty-five lesions in 2 1 patients treated with TACE without mixing iodized-oil were evaluated. The tumor size was evaluated by 
measuring the entire lesion, including the necrotic part, using RECIST version 1.1, whereas only the contrast-enhanced part 
observed during the arterial phase was measured using mRECIST. Five radiologists independently measured each lesion 
twice. To evaluate the inter-criteria reproducibility, the complete response (CR) rate, the response rate, the kappa statistics, 
and the proportion of agreement (PA) for response categories were calculated. The same analyses were conducted for inter- 
and intra-observer reproducibility. 

Results. In the inter-criteria reproducibility study, the CR rate and the response rate obtained using mRECIST (56.9% and 
79.7%) were higher than those obtained using RECIST version 1.1 (9.2% and 43.1%). In the inter- and intra-observer 
reproducibility study, mRECIST exhibited an 'almost perfect agreement', while RECIST version 1.1 exhibited a 'substantial 
agreement'. 

Conclusions. Considerable differences in the CR rate and the response rate were observed. From the viewpoint of the high 
inter- and intra-observer reproducibility, mRECIST may be more suitable for tumor response criteria in clinical trials of TACE 
for HCC. 
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Introduction 

Two standard sets of criteria are used to evaluate the 
tumor response of hepatocellular carcinoma (HCC) 
treated using loco-regional therapy, such as 



transcatheter arterial embolization (TACE): RECIST 
(Response Evaluation Criteria in Solid Tumors) criteria 
(1) and modified RECIST (mRECIST) criteria (2). 

RECIST criteria were published by the National 
Cancer Institute in 2000 with the objective of unifying 
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the criteria used for response assessments. These 
criteria evaluate the unidimensional measurement 
of the longest diameter of the tumor lesions and 
have been used in most oncology trials. However, a 
number of questions and issues have arisen, leading to 
the development of revised RECIST (version 1.1) 
criteria (3). In the RECIST version 1.1 criteria, the 
major changes included the number of lesions to be 
assessed, the assessment of pathological lymph nodes, 
confirmation of a response, disease progression, and 
the necrotic tumor size (i.e. in cases where a lesion 
which was solid at baseline has become necrotic in 
the center, the longest diameter of the entire lesion 
should be followed). 

In 2000, a panel of experts on HCC from the 
European Association for the Study of the Liver 
(EASL) agreed that estimating the reduction in viable 
tumor volume (as recognized using enhanced spiral 
computed tomography (CT)) should be considered 
the optimal method for assessing the local response to 
treatment in patients with HCC (4). Since then, most 
authors reporting the results of loco-regional therapy 
for HCC have evaluated tumor response according to 
this recommendation (5,6). 

The aforementioned expert panel continued the 
concept of viable tumor endorsed by EASL and 
adapted the unidimensional measurement as a sub- 
stitute for the bidimensional one in the determination 
of tumor response for target lesions in HCC (7). 
These amendments confirmed the American 
Association for the Study of Liver Disease 
(AASLD)-Journal of the National Cancer Institute 
(JNCI) guidelines and were defined as 'modified 
RECIST (mRECIST)' criteria (2). Therefore, mRE- 
CIST criteria were developed for loco-regional 
therapies to HCC. On the other hand, RECIST 
version 1 . 1 criteria were developed for systemic ther- 
apies; however, RECIST version 1 . 1 criteria are used 
in many oncology trials including loco-regional 
therapies for the treatment of HCC. 

A study investigating the inter-criteria reproduci- 
bility between the older versions of criteria (RECIST 
version 1 .0 and EASL) has been reported (8) . Further- 
more, a comparative study of tumor response by 
the updated criteria (RECIST version 1 . 1 and mRE- 
CIST) has been published (9). However, to the 
best of our knowledge, the inter- and intra-observer 
reproducibility between RECIST version 1.1 and 
mRECIST has not been investigated or reported. 

Using these standardized criteria for evaluating 
tumor response in clinical trials, reproducible 
results should be obtained by all investigators. For 
a surrogate marker such as tumor response for 
therapy, both 'precision' (observer consistency study) 
and 'accuracy' (validation study comparing to gold 



standard) are evaluated. From the viewpoint of 
'precision', we compared RECIST version 1.1 and 
mRECIST criteria by evaluating the inter- and 
intra-observer reproducibility. 

The purpose of the present study was to clarify the 
differences in tumor response as evaluated using two 
updated sets of criteria (RECIST version 1.1 and 
mRECIST) by assessing the inter-criteria reprodu- 
cibility. Moreover, another purpose of the present 
study was to investigate which set of criteria was 
superior for use as tumor response evaluation criteria 
in clinical trials of TACE for HCC by assessing the 
inter- and intra-observer reproducibility. 

Materials and methods 

We analyzed the radiological findings of patients 
who underwent pan-hepatic TACE for multiple 
HCCs in a multicenter clinical trial. In this trial, 
the eligibility criteria included patients with untreated, 
bilobar multiple HCCs, compensated Child-Pugh A 
or B cirrhosis, and the absence of vascular invasion or 
extrahepatic spread. TACE was performed using 
cisplatin (IA call, Nihon-Kayaku; 35-65 mg/m 2 ) 
and gelatin particles without mixing iodized-oil. The 
present study was conducted in accordance with the 
Helsinki Declaration, and the protocols were approved 
by the institutional review board. Informed written 
consent for the treatment protocols, including the 
secondary use of treatment-associated documents, 
was obtained from each patient. Twenty-one patients 
were entered from 19 July 2005 to 15 May 2007. 

Image analysis 

All patients underwent a dynamic study performed 
using a multi-slice CT scanner with non-ionic 
contrast medium. CT scans were obtained within 
two weeks before TACE and one month after 
TACE. Tumor assessments were made using a 
5-mm interval, and axial images were obtained during 
the unenhanced phase, the arterial phase, and the 
portal venous or equilibrium phase. 

Tumor response evaluation 

Response was defined according to RECIST version 
1 . 1 criteria measuring the entire lesion, including the 
necrotic part. On the other hand, mRECIST were 
used to evaluate the lesion taking tumor necrosis, 
recognized by the non-enhanced areas, into account. 
Both guidelines adopted the unidimensional mea- 
surement (Figure 1). 

According to RECIST version 1 . 1 criteria, a com- 
plete response (CR) was defined as the disappearance 
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of all target lesions; a partial response (PR) was denned 
as at least a 30% decrease in the sum of the longest 
diameter of the target lesions; progressive disease (PD) 
was defined as at least a 20% increase in the sum of the 
longest diameter of the target lesions; and stable disease 
(SD) was defined as neither sufficient shrinkage to 
qualify for PRnor a sufficient increase to qualify for PD . 

According to mRECIST criteria, CR was defined 
as the absence of enhanced tumor areas during the 
arterial phase, reflecting complete tissue necrosis; PR 
was defined as at least a 30% decrease, PD was 




Figure 1 . A: RECIST ver. 1.1: Response was defined according to 
a unidimensional measurement of the entire lesion, including the 
necrotic part. B: mRECIST: Response was defined according to a 
unidimensional measurement of the viable part, excluding the 
necrotic part. 



defined as at least a 20% increase in the sum of the 
longest diameter in the enhanced tumor areas; and 
SD was defined using the same definition as that used 
in RECIST version 1 . 1 criteria. 

Evaluation methods 

Five observers measured 65 lesions in 21 patients 
independently. A total of 325 measurements were 
made for the first measurement. The second mea- 
surement was performed independently by the same 
five observers. The sum of the longest diameters for 
all the target lesions was calculated for baseline and 
post-treatment. The baseline sum was used as the 
reference from which the objective tumor response 
could be calculated. The percentage changes were 
calculated as the post-treatment value divided by 
the pre-treatment value. The percentage changes 
were then classified using RECIST version 1.1 and 
mRECIST tumor response classification systems. 
Tumor response was categorized as CR, PR, SD, or 
PD based on both sets of criteria. Furthermore, the 
CR rate and the response rate were also calculated. 

All the images were collected from each institution 
and supplied to the Japan Interventional Radiology in 
Oncology Study Group QIVROSG) Data Center 
using the WEB system. 

Analysis of inter-criteria reproducibility 

To examine the inter-criteria reproducibility between 
RECIST version 1.1 and mRECIST criteria, we 
estimated the kappa statistics and the proportion of 
agreement for the CR, PR, SD, and PD categories 
among the five observers. The data for the first 
measurements were analyzed to evaluate the inter- 
criteria reproducibility. 

Analysis of inter-observer reproducibility 

To examine the inter-observer reproducibility among 
the five observers, we estimated the kappa statistics 
and the proportion of agreement. Each pair yielded 
10 pairs for comparison. The data for the first 
measurements were analyzed to evaluate the 
inter-observer reproducibility. 

Analysis of intra-observer reproducibility 

The data for the first and second measurements were 
compared to assess the intra-observer reproducibility 
for the same observer. The intra-observer reproduci- 
bility for the same observer yielded five pairs for 
comparison. 
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Statistics 

Kappa statistics were performed to determine the 
concordance/agreement of the tumor response 
criteria. The potential kappa values ranged from - 
1.0 (complete disagreement) through 0 (chance 
agreement) to 1.0 (complete agreement). Interpreta- 
tions of the strength of the agreement determined 
using the kappa values were given by adopting 
the criteria (9). The kappa values of the two agree- 
ments were compared for statistical significance 
using a paired t test. Comparisons between groups 
were done using the Fisher exact test. A conventional 
P value of 0.05 was considered statistically signifi- 
cant. All analyses were conducted using SPSS 
(version 17.0). 

Results 

Patient population 

Sixty-five untreated lesions in 21 patients treated 
using pan-hepatic TACE were evaluated. The 
patients' characteristics were as follows (Table I), 
median age (range): 68 years (27-74 years); sex 
(male/female): 19/2; hepatitis C virus/hepatitis B 
virus/others: 12/3/6; Child-Pugh A/B: 20/1; total 
number of nodules (range): 65 nodules (1-5 nodules); 
mean tumor size (range): 20 mm (10-132 mm). 

Inter-criteria reproducibility 

The inter-criteria reproducibility using RECIST ver- 
sion 1.1 and mRECIST criteria is summarized 
in Tables II and III. Five observers measured 
65 lesions independently, for a total of 325 measure- 
ments. According to RECIST version 1.1 criteria, the 
CR rate and the response rate were 9.2% and 43.1%, 
respectively; according to mRECIST criteria, the CR 
rate and the response rate were 56.9% and 79.7% 
(Table II). 

Among the 185 CR lesions that were identified 
using mRECIST criteria, RECIST version 1 . 1 criteria 



Table I. Patients and characteristics. 



No. of patients 


21 


Age, median (range) 


68 (27-74) 


Sex (male/female) 


19/2 


HCV/HBV/others 


12/3/6 


Child-Pugh A/B 


20/1 


No. of nodules, all (range) 


65 (1-5) 


Mean tumor size (range), mm 


20 (10-132) 



HCV = hepatitis C virus; HBV = hepatitis B virus. 



classified the same responses as PR for 89 lesions, SD 
for 64 lesions, and PD for 2 lesions (Table III). The 
kappa value was 0.149 (95% CI 0.098-0.201), and 
the proportion of agreement was 35.5% (Table III). 

Inter-observer reproducibility 

The inter-observer reproducibility among the five 
observers was analyzed using the data for the first 
measurements, with each pair yielding 10 pairs 
for comparison. These 10 pairs for comparisons, or 
650 measurements, are collectively shown in 
Table IV. For the inter-observer reproducibility for 
RECIST version 1 . 1, the kappa value was 0.628 (95% 
CI 0.571-0.684), and the proportion of agreement 
was 78.8%. For the inter-observer reproducibility for 
mRECIST, the kappa value was 0.829 (95% CI 
0.792-0.866), and the proportion of agreement was 
90.0%. 

Intra-observer reproducibility 

The intra-observer reproducibility was analyzed from 
the data for the first and second measurements, with 
each pair yielding five pairs for comparison. These five 
pairs for comparisons, or 325 measurements, are 
collectively shown in Table V. For the intra- 
observer reproducibility for RECIST version 1.1, 
the kappa value was 0.643 (95% CI 0.565-0.722), 
and the proportion of agreement was 79.4%. For the 
intra-observer reproducibility for mRECIST, the 
kappa value was 0.900 (95% CI 0.858-0.942), and 
the proportion of agreement was 94.2%. 

Discussion 

The inter-criteria reproducibility study between 
RECIST version 1.0 and EASL guidelines, and a 
comparative study of tumor response by RECIST 
and mRECIST have been reported (8,9). However, 
no information is available concerning the inter- 
observer reproducibility in those reports. In addition 
to performing an inter-criteria reproducibility study, 
we also estimated the inter- and intra-observer repro- 
ducibility to investigate which set of criteria (RECIST 
version 1 . 1 or mRECIST) is superior for performing 
tumor response evaluations in clinical trials of TACE 
for HCC. 

Inter-criteria reproducibility 

An evaluation of the tumor response according to 
RECIST version 1.0 and EASL guidelines after loco- 
regional therapies in patients with HCC has been 
reported. RECIST missed all the CRs obtained by 
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Table II. Inter-criteria reproducibility between RECIST version 1.1 and mRECIST criteria. Number of lesions (%). 


Response category 


Complete response 


Partial response 


Stable disease 


Progressive disease 


Overall response 3 


Response criteria 
RECIST 

mRECIST 


30 (9.2) 
P< 0.001 
185 (56.9) 


110 (33.8) 
74 (22.8) 


180 (55.4) 
65 (20) 


5 (1.5) 
1 (3) 


140 (43.1) 
P< 0.001 
259 (79.7) 



a Complete response + partial response. 

RECIST = Response Evaluation Criteria in Solid Tumors; mRECIST = modified RECIST. 



Table III. Inter-criteria reproducibility between RECIST version 1.1 and mRECIST criteria: distribution chart. 



RECIST 







Complete response 


Partial response 


Stable disease 


Progressive disease 


Total 


mRECIST 


Complete response 


30 


89 


64 


2 


185 




Partial response 


0 


21 


53 


0 


74 




Stable disease 


0 


0 


63 


2 


65 




Progressive disease 


0 


0 


0 


1 


1 


Total 




30 


110 


180 


5 


325 



Proportion of agreement = 35.5%. Kappa = 0.149. 



tumor necrosis and underestimated the extent of the 
partial tumor response because of tissue necrosis (8) . 

In our inter-criteria reproducibility study compa- 
ring RECIST version 1.1 and mRECIST criteria, 
similar results were obtained. The CR rate and the 
response rate obtained using mRECIST criteria were 
higher than those obtained using RECIST version 
1.1 criteria (56.9% versus 9.2%, P< 0.001; 79.7% 
versus 43.1%, P< 0.001). 

According to mRECIST criteria, if a tumor that 
was solid at baseline became entirely necrotic, all the 
tumors were evaluated as CR. On the other hand, 
using RECIST version 1 . 1 criteria, the necrotic tumor 
was evaluated as a non-CR based on the measurement 
of the entire lesion, leading to a different conclusion, 
such as PR, SD, or PD (Figure 2). Among 185 CR 
lesions that were identified using mRECIST criteria, 



Table IV. Inter-observer reproducibility. 

Proportion of 
Kappa agreement (%) 

Inter-observer reproducibility 

RECIST 0.628 78.8 

(95% CI 0.571-0.684) 

mRECIST 0.829 90.0 

(95% CI 0.792-0.866) 



155 lesions (83.8%) were evaluated as non-CR using 
RECIST version 1.1 criteria. In particular, two 
lesions evaluated as CR using mRECIST criteria 
were categorized as PD using RECIST version 1.1 
criteria; thus, two sets of criteria produced opposite 
conclusions (Table III). As the tumor size was very 
small and a 20% increase was thought to be within 
the range of measurement error, these two lesions 
were identified as PD using RECIST version 1.1 
criteria. In some cases, this event might be caused 
by an increase in the necrotic tumor size secondary 
to chemoembolization. Therefore, the inter-criteria 
reproducibility between RECIST version 1.1 and 
mRECIST criteria for loco-regional therapy achi- 
eving complete tumor necrosis may have a low 
concordance. 

The differences in the CR rate and the response 
rate between RECIST version 1.1 and mRECIST 
criteria indicate that the researchers should ascertain 
the presence or absence of 'm' (mRECIST? or 
RECIST?). 

Inter- and intra-observer reproducibility 

Standardized tumor response evaluation systems are 
considered to be reliable in clinical trials when they 
are reproducible among different observers. The 
importance of inter-observer reproducibility for any 
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Table V. Intra-observer reproducibility. 

Proportion of 

Kappa agreement (%) 

Intra-observer reproducibility 

RECIST 0.643 79.4 
(95% CI 0.565-0.722) 

mRECIST 0.900 94.2 
(95% CI 0.858-0.942) 



classification scheme has been discussed previously 
for other grading systems (10-14). Clinical investiga- 
tors must take into account inter-observer reproduc- 
ibility in tumor response evaluations, which can 
greatly affect the results of clinical trials. 




Figure 2. A: CT before TACE: Both criteria (RECIST version 
1 . 1 and mRECIST) measured the longest diameter of the tumor. 
B: CT after TACE: The tumor had become entirely necrotic. The 
tumor response was evaluated as CR using mRECIST criteria (i.e. 
no measurement) and as non-CR using RECIST version 1 . 1 criteria 
(i.e. the measurement of the longest diameter of the entire tumor). 



In our inter- and intra-observer reproducibility 
study, the kappa value and the proportion of agree- 
ment using mRECIST criteria ('almost perfect 
agreement') were higher than those for RECIST 
version 1.1 criteria ('substantial agreement'). In 
consideration of the high inter- and intra-observer 
reproducibility, mRECIST can be more recom- 
mended for use as tumor response criteria in clinical 
trials of TACE for HCC. 

The present study had several limitations. The 
number of patients was relatively small, and the 
analyses were performed not on a per-patient basis, 
but on a per-lesion basis. To investigate which set of 
criteria was superior as tumor response criteria in 
clinical trials of TACE for HCC, the observer con- 
sistency study (inter- and intra-observer reproduci- 
bility between the two updated sets of criteria) were 
investigated in this study. A validation study com- 
paring the updated criteria to the gold standard 
(i.e. overall survival) should be encouraged in future 
studies. 

In conclusion, considering the differences in the 
CR rate and the response rate between RECIST 
version 1.1 and mRECIST criteria, close attention 
must be paid to the criteria used for a precise 
interpretation of the tumor response outcome. 
Furthermore, mRECIST criteria may be more suit- 
able for tumor response criteria in clinical trials of 
TACE for HCC, compared with RECIST version 
1 . 1 criteria, from the viewpoint of the high inter- and 
intra-observer reproducibility. 
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